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Abstract. The computational paradigm represented by Cellular Neural/nonlinear Networks (CNN) and 
the CNN Universal Machine (CNN-UM) as a Cellular Wave Computer, gives new perspectives for com- 
putational physics. Many numerical problems and simulations can be elegantly addressed on this fully 
parallelized and analogic architecture. Here we study the possibility of performing stochastic simulations 
on this chip. First a realistic random number generator is implemented on the CNN-UM, and then as an 
example the two-dimensional Ising model is studied by Monte Carlo type simulations. The results obtained 
on an experimental version of the CNN-UM with 128 x 128 cells are in good agreement with the results 
obtained on digital computers. Computational time measurements suggests that the developing trend of 
the CNN-UM chips - increasing the lattice size and the number of local logic memories - will assure an 
important advantage for the CNN-UM in the near future. 

PACS. 07.05. Tp Computer modeling and simulation - 05.10. Ln Statistical physics and nonlinear dynamics 
- 89.20.Ff Computer science and technology 

1 Introduction problem, are just a few examples which reminds us that 

computing power needs to keep up with it's exponen- 
Many areas of science and especially physics are pref- tial growth, as expressed by Moore's law J. We know 
acing serious problems concerning the computing power however that this process can not continue much further 
of the presently available computers. Solving more and so i e i y w j t h the classical digital computers and new corn- 
more complex problems, simulating large systems, ana- pu tational paradigms are necessary. Parallel computing, 
lyzing huge datasets for which even storing represents a 
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grid computing and quantum computing are just the most 2 The CNN U niversal Machine 

popular examples. The goal of the present article is to 

The theory of cellular neural/nonlinear networks (CNN) 

make the physicist community aware of a modern and 

appeared in 1988 but the hardware based on this 

promising trend which is called by computational scien- 

theory, like the CNN Universal Machine (CNN-UM) [5] 

tists and engineers Cellular Wave Computers 0. This 

are just now developing. The CNN-UM is an analogic 

computer is based on the Cellular Neural/nonlinear Net- 

(analog+logic) computer which has on it's main processor 

work (CNN) and it is experimentally realized by different 

several thousands of interconnected computational units 

physical principles in the architecture of the CNN Univer- 

(cells), working parallelly. The CNN-UM can be easily 

sal Machine (CNN-UM). Possibilities of performing fast 

connected to any PC type computer and programmed 

image processing [3] , solving in an elegant manner partial 

through a special programming language |l(Jj . This new 

differential equations or studying cellular automata 

kind of hardware does not replace digital computers, but 

models 013 on CNN were already studied. Here we ar- 

due to it's special structure and architecture it could rep- 

gue, that the CNN architecture is also appropriate for 

resent an excellent platform for solving some complex prob- 

Monte Carlo (MC) type simulations on lattice models. As 

lems of physics which demand high computational power. 

a specific example we study on an experimental version of 

CNN-UM is also extremly usefull as a visual or tactile 

CNN-UM (the ACE16K chip which has 128 x 128 cells) 

topographic microprocessor. 

the well-known second-order phase transition in the two- 

The standard CNN is composed by L x L cells placed 

dimensional Ising model. Due to the fact that some simple 

on a square lattice and interconnected through the 8 neigh- 
operations are not included in this experimental hardware 

bors. Each cell is an electronic circuit in which the most 

implementation, on this chip the speed of the simulations 

important element is a capacitor. The voltage of this ca- 

is in the range of modern PC type computers. We will ar- 

pacitor is called the state value of the cell Xij(t). The cell 

gue however, that the developing trend of this new hard- 
has also an input value (voltage) Ui t j, which is constant 

ware (2 and 3 layer complex cell CNN-UM architectures, 

in time and can be defined at the beginning of an oper- 
and a powerful new visual microprocessor is coming out 

ation. The third characteristic of the cell is the output 

at AnaFocus Ltd. soon) could substantially increase the 

value yij(t). This is equivalent with the Xij state value 

speed of such simulations, assuring an important advan- 

in a given range. More specifically it is a piece-wise lin- 

tage for CNN computing. 

ear function, bounded between —1 (white) and 1 (black): 
y = f(x) = ±(\x + l\-\x-l\) 

The connections between the cells are realized with 
voltage-controlled resistors resulting that the state value 
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of each cell depends on the input and output values of the 3 Random number generators on the 

connected neighbors. The state equation of the CNN cells, CNN-UM 
resulting from the time-evolution of the equivalent circuit 
(supposing the 8 neighbor interactions) is the following 

Many applications ideal for the analogic (analog &logic) 
architecture of the CNN-UM were already developed and 

dx- (i) 11 

-Ji2±l = -Xi t j(t) + Ak,m+k.j+i(t) + (1) tested. For practical purposes the most promising appli- 

fe=-i ;=-i 

i 1 cations are for image processing, robotics or sensory com- 

k ' 1 l+k ^ +l lJ puting purposes f3j. The CNN architecture seems also 

The coupling between neighbors can be controlled with promising when considering complex problems in natu- 
matrices A and B. Within the standard CNN (and on the ral sciences. Studies dealing with partial differential equa- 
hardwares realized up to the present days) A and B are the tions ( PDE ) BHMHEH or cellular automata (CA) mod- 
same for all cells. Parameters z itj are constant values and els EE prove this. Solving partial differential equations 
can vary from cell to cell. The set of parameters {A, B, z} is relatively easy and offers the advantage of continuity in 
is called a template. An operation is performed by giving time El- Deterministic cellular automata with simple 
the initial states of the cells, the input image (the input nearest-neighbor rules are also straightforward to imple- 
values of all cells) and by defining a template. The states of ment in the CNN architecture. In physics however, many 
all cells will vary parallelly and the result of the operation of the interesting problems deal with stochastic cellular 
will be the final steady state of the CNN. Each operation automaton, random initial conditions or other MC meth- 
is equivalent with solving a differential equation defined ods on lattices (spin problems, population dynamics mod- 
by the template itself, with the extra condition that the els > lat tice gas models, percolation etc.). Developing and 
state of a cell remains bounded in the [-1,1] region [TT] . proving the efficiency of stochastic simulation techniques 

The CNN-UM is a programmable cellular wave on the CNN-UM - using its stored (or algorithmic) pro- 
computer in which each cell contains additionally a local grammability - would be thus an important step toward 
analog and a logic unit, local analog and logic memories 1 ™ success. 

and a local communication and control unit. Beside these It is known that for a successful stochastic simulation 

local units, the CNN-UM has also a global analog pro- the crucial starting point is a good random number gen- 

gramming unit which controls the whole system, making erator (RNG). While computing with digital processors, 

it a programmable computer. It can be easily connected the "world" is deterministic and discretized, so in prin- 

to PC type computers and programmed with special lan- ciple there is no possibility to generate quickly random 

guages, for example the analogic macro code (AMC). events and thus real random numbers. The implemented 
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RNGs are all pseudo-random number generators working al. [2} called the PNP2D was chosen. This chaotic CA is 
with a deterministic algorithm, and it is believed that based on the following update rule 
their statistics approximates well real random numbers. 

™ , , , , x t+1 (i,j) = (x t (i + l,j)\/xt(i,j + l))®xt(i-l,j)®(2) 

ihe reproducibility ot the pseudo-random numbers can 

be sometimes an advantage (debugging the code) but in ® x t{hi 1) (B%t{i,j), 

many cases it presents a serious disadvantage. A first ad- 

where i,j are the coordinates of the cells, the index t 

vantage of the analog architecture is the possibility to use 

denotes time-steps and a; is a logic value or 1 (repre- 

the the natural noise on the device and to generate real 

senting white and black pixels, respectively). Symbols V 

random numbers. 

and © stand for the logical operations or and exclusive- 
There are relatively few papers presenting or using . , » , , , , , , . 

or (X(JR), respectively. As described by the authors this 

RNGs on the CNN-UM EECSI. The known and used , „ K , , . , , 

chaotic CA is relatively simple and fast, it passed all lm- 

ones are all pseudo-random number generators based on 

portant RNG tests and shows very small correlations. It 

chaotic Cellular Automaton (CA) type update rules, gen- 

generates binary values and 1 with the same 1/2 proba- 

erating binary images with 1/2 probability of the black ...... . . ... . 

bihty independently of the starting condition. It is a good 

and white pixels (logical 1 and 0, respectively). They were 

candidate for a pseudo-random number generator and our 

used mainly in cryptography 7 and watermarking on pic- 

first goal is to transform it into a realistic RNG. The way 

tures 14 1. In a recent paper ED we presented a realistic . 

to do this is relatively simple. After each time step the Pit) 

RNG by using the natural noise of the chip. An algorithm , 

result of the chaotic CA is perturbed with a noisy N[t) 

for generating binary images with any probability of the . . . 

binary picture (array) so that the final output is given as: 

black pixels was also described. Here we present briefly . , . , . „ T . . _ , , , . „ 

P'(t) = P(t) © N(t). The symbol © stands again for the 

this realistic RNG and for more details we recommend . . . . . 

logical operation XOR, i.e. pixels which are different on 

the two pictures will become black (logic value 1). This 

The natural noise of the CNN-UM chip is usually highly operation assurcs that no matter how ^(i) looks like, the 

correlated in space and time, so it can not be used directly density of black pixels rem ains the same 1 /2. Because the 

to generate random binary values. Our method is based used noigy images contain only yery few black pixels (logic 

on a chaotic CA perturbed with the natural noise of the values ^ we just sligntly s i de track the chaotic CA from 

chip. The random nature of the noise eliminates the de- the original deterministic path and all the good properties 

terministic properties of the chaotic CA. Q f tne p Seu do-random number generator will be preserved. 

As a starting point the relatively simple but efficient The N(t) noisy picture is obtained by the following sim- 

chaotic CA, presented by Crounse et al. 7 and Yalcin et pie algorithm. All pixels of a gray-scale image are filled 
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up with a constant value a and a cut is performed at a El)- Taking into account thus the natural trend that the 
threshold a + z, where z is a relatively small value. In this lattice size of CNN-UM chips will be growing and that 
manner all pixels which have smaller value than a + z will calculations on this chip are totally parallel, these results 
become white (logic value 0) and the others black (logic predict a promising trend. Some codes and movies about 
value 1). Like all the logic operations this operation can the RNGs on the ACE16K chip are available on the home- 
be also easily represented by a CNN template. Since the page dedicated to this study [TU] , 
CNN-UM chip is an analog device, there will always be 

a natural noise on the gray-scale image. Choosing thus a . _ .. , ,. ,, , 

4 Studying the Ising model on the CNN-UM 

proper z value one can always generate a random binary 

picture with few black pixels. These N(t) pictures might Once a properly working RNG is available, Monte Carlo 

be strongly correlated and will fluctuate in time. The time- type simulations on two-dimensional lattice-type models 

like fluctuations are caused by real stochastic processes in are possible. Generating random initial conditions for cel- 

the transistor circuits of the chip and can not be thus lular automata models is straightforward and many simple 

controlled. They are the source of a convenient random stochastic lattice models can be relatively easily solved 

perturbation on the chaotic CA, and are responsible for ^S]. Here we consider the well-known two-dimensional 

the realistic nature of the RNG. In case one would need Ising model. Implementing the MC study of this model on 

a repeatable series of pseudo-random numbers the chaotic the CNN-UM is however not trivial. As it will be shown 

CA is simply not perturbed by the N(t) noisy picture. later a straightforward application of the usual Glauber 

TT . • j j i i . .,, [TBI or Metropolis [T7| algorithms could lead to problems 

Using now n independent random binary images with *~ ' ^ ' — 1 ° ^ 

1 /o j -j. r i-u ui i • i -a ■ -ui i. j- due to the parallel architecture of the computer. 

1/2 density of the black pixels, it is possible to generate v v 

pictures with any p probability of the black pixels (p be- In the Isin § model the s P ins can have tw0 P ossible 

ing a number represented by n-bits, when expressed as a states a = ±L 0n thc CNN-UM these states can be 

power of 1/2). For more details see [T5]. ma PP cd 011 thc " black " and " white " states of the cells ' 



This RNG and the described algorithms were tested 
and are properly working on an ACE16K chip which is 
an experimental version of the CNN-UM with 128 x 128 



Without an external magnetic field the hamiltonian of the 
system is 

H = - J Wi, (3) 



<hj> 

cells. It is found that the RNG with p = 0.5 is already < i,j > representing nearest neighbors. There are many 
almost 5 times faster on the ACE16K than on modern different MC type methods for studying this basic lattice 
PC type digital computers. Generating images with other model. Most of them like the Metropolis or the Glauber al- 
p probabilities is of course slower, depending on n (see gorithm are of serial nature, meaning that in each step we 
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update one single spin. Working however parallelly with alized by the template: A = {0,0,0,0,2,0,0,0,0}, B = 
all spins, could create some unexpected problems due to {0,0,0,1,0,0,0,0,0}, z — 0). We generate two random 
the fact that nearest neighbors are updated simultane- images with probability exp(— 8J/kT) and exp(— 4J/kT) 
ously. Imagine for instance an initial state where the spin- and we perform an AND operation between the random 
values are assigned using a chessboard pattern. This state image and the corresponding mask. After uniting the re- 
will have a zero total magnetization. Let us consider now suits of these two and the third mask (Ah < 0) we get a 
the zero-temperature case and the Glauber or Metropo- new mask which marks all spins which have to be nipped, 
lis algorithm. Contrary to what is expected, this system Finally we use the chessboard mask and allow only those 
will not order in a simple ferromagnetic phase but it will spins to flip which correspond to black (white) pixels if 
continuously switch between the two opposite chessboard the time-step is odd (even). The CNN code developed for 
patterns. For eliminating the parallel update of the neigh- studying this problem can be also downloaded from the 
bors which causes such problems but still taking advan- home-page dedicated to this study ^U]- It worth men- 
tage of the parallel nature of the computer, we impose an tioning that cluster algorithms, like the one proposed by 
extra chessboard mask on the system. In each odd (even) Swendsen and Wang ^2] or Wolf ^15] > seem to be also 
step we update parallelly the spins corresponding to the appropriate for the parallel architecture of the CNN-UM. 

black (white) cells of the chessboard mask. For the chosen . , ,. ii.ui.-j -i.u i.u at a. r * 

v ' bimulation results obtained with the Metropolis type 

spins the simple Metropolis algorithm is used. It is simple , ., , , , , , c m ^ i_i • a 
t i t b t- algorithms are sketched on hg. |1] On this figure we corn- 
to realize that our method is equivalent with the classical ,, , ,, , . , , ,. , .,, 

n pare results of (i) the classical Metropolis algorithm on 

serial Metropolis dynamics in which the spins are updated ,. ., , ,..\ r „ , , 

r J * r a digital computer, (n) the results of our parallel algo- 

in a well-defined order. Detailed balance and ergodicity is , , j ., , , ,-.\ ,-, 

° rithm simulated on a digital computer and (in) the results 

valid, so the obtained statistics should be the right one. , , . , A/-mi^ T ^ i • n i ^ ^ 

' ° obtained on the ACF16K chip. By plotting the average 

T , . ,-, , , ni . T „ t tt\ t . magnetization, the specific heat and the susceptibility as 

Implementing the above scheme on the CNN-UM is ° ' ^ r j 

,., rn , T n j2ii-iiii i a function of the temperature one can conclude that dif- 
reahzed as follows. In each step we first build tfiree ad- 

, n ,, n . i,i • -ii A ■ -i ferent results are in good agreement with each other. All 

ditional masks: the first marks the spins with 4 similar ° ° 

• , u r at? o t\ i.u j i iu • simulations were performed on a 128 x 128 lattice using 

neighbors (Ah = 8 J), the second one marks the spins ^ ° 

.,, . . , , i atp a t\ j i-u 4-u- j free boundary conditions, 
with o similar neighbors (Ah = 4J), and the third rep- 

resents all the other spins for which Ah < 0. Separat- Fig.QJi plots the time needed for 1 MC step as a func- 

ing these cells is relatively easy using logic operations tion of the lattice size L. While on a PC type computer 

and some special templates which can shift the images this scales as L 2 , on the CNN-UM the time does not de- 

in different directions (for ex. shifting to right can be re- pend on the lattice size (each command is executed in a 
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Fig. 1. Average magnetization M (a), specific heat C v (b) and 
susceptibility \ ( c ) are plotted as a function of the temperature 
T for the classical Metropolis algorithm on a digital computer 
(squares), our parallel algorithm simulated on a digital com- 
puter (triangles) and the algorithm simulated on the ACE16K 
CNN-UM chip (circles). Figure (d) compares the simulation 
time t (in ms) needed for 1 MC step on a Pentium 4 PC with 
2.4 GHz (squares) and the CNN-UM (circles) as a function of 
the lattice size L. The filled circle marks the simulation time 
obtained on the ACE16K chip (L = 128). 



fully parallel manner on the whole lattice) . The time mea- 
sured on the ACE16K chip with L = 128 was 4.8ms, while 
on a Pentium 4 PC working on 2.4 GHz under Linux op- 
erating system the time needed for 1 MC step was 2ms. 
For this lattice size the simulations are still faster on the 
classical digital computers, however considering the trend 
that the size of the CNN chip (Table 1) will increase in 
the near future the results are still promising. 



Name 


Year 


Size 




1993 


12 x 12 


ACE440 


1995 


20 x 22 


POS48 


1997 


48 x 48 


ACE4k 


1998 


64 x 64 


CACE1K 


2001 


32 x 32 x 2 


ACE16K 


2002 


128 x 128 


XENON 


2004 


128 x 96 x 2 


EYE-RIS 


2005 


176 x 144 


CACE2K 


under fabrication 


32 x 32 x 3 



TABLE 1. Evolution of the CNN-UM chip, different physical 
realizations. From these chips only the ACE16K is commer- 
cially available, mass production is expected to begin with the 
EYE-RIS at the end of 2006. 

It also worth mentioning here that this ACE16K chip 
was developed mainly for image processing purposes, the 
cells have only 2 Local Logic Memories (LLM) and 8 Ana- 
log Memories (LAM). While performing logic operations 
on our binary images we always had to copy the images 
to the LLMs and save than the results again to LAMs. 
These copying processes used around 3/4 of the process- 
ing time. Most of this lost time could be and hopefully will 
be eliminated in the future by increasing the number of 
available LLMs. One must also not forget that the CNN- 
UM was developed mainly for analog signal processing and 
the main strength of these chips are related to gray scale 
operators. In that area the proven speed advantage is in 
about three orders of magnitude pill3) . 
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5 Conclusions 

In the present study we worked with binary images and 
we exploited mainly the parallel and connectivity features 
of the CNN. Our results suggest that the special architec- 
ture makes the Cellular Wave Computers very appropriate 
for simulating lattice models and it's natural noise can be 
effectively used in stochastic simulations. The ongoing de- 
veloping process of this hardware is expected to increase 
the number of cells and local memories, and also three- 
dimensional chips with more layers of cells are expected 
to appear. This would assure an important advantage for 
these chips in the near future. We think that CNN com- 
puting could be effectively used in computational physics 
for supplementing digital computers in some complex and 
time-consuming problems. 
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